Brand Visibility in Packaging: A Deep Learning Approach for Logo Detection, Saliency-Map Prediction, and Logo Placement Analysis
Abstract: In the highly competitive area of product marketing, the visibility of brand logos on packaging plays a crucial role in shaping consumer perception, directly influencing the success of the product. This paper introduces a comprehensive framework to measure the brand logo's attention on a packaging design. The proposed method consists of three steps. The first step leverages YOLOv8 for precise logo detection across prominent datasets, FoodLogoDet-1500 and LogoDet-3K. The second step involves modeling the user's visual attention with a novel saliency prediction model tailored for the packaging context. The proposed saliency model combines the visual elements with text maps employing a transformers-based architecture to predict user attention maps. In the third step, by integrating logo detection with a saliency map generation, the framework provides a comprehensive brand attention score. The effectiveness of the proposed method is assessed module by module, ensuring a thorough evaluation of each component. Comparing logo detection and saliency map prediction with state-of-the-art models shows the superiority of the proposed methods. To investigate the robustness of the proposed brand attention score, we collected a unique dataset to examine previous psychophysical hypotheses related to brand visibility. the results show that the brand attention score is in line with all previous studies. Also, we introduced seven new hypotheses to check the impact of position, orientation, presence of person, and other visual elements on brand attention. This research marks a significant stride in the intersection of cognitive psychology, computer vision, and marketing, paving the way for advanced, consumer-centric packaging designs.
- Ampuero, O., Vila, N.: Consumer perception of product packaging. Journal of Consumer Marketing 23, 100–112 (2006) https://doi.org/10.1108/07363760610655032 Méndez et al. [2011] Méndez, J., Oubiña, J., Rubio, N.: The relative importance of brand-packaging, price and taste in affecting brand preferences. British Food Journal 113, 1229–1251 (2011) https://doi.org/10.1108/00070701111177665 Stewart [1995] Stewart, B.: Packaging as an Effective Marketing Tool. CRC Press, ??? (1995) Shukla et al. [2022] Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Méndez, J., Oubiña, J., Rubio, N.: The relative importance of brand-packaging, price and taste in affecting brand preferences. British Food Journal 113, 1229–1251 (2011) https://doi.org/10.1108/00070701111177665 Stewart [1995] Stewart, B.: Packaging as an Effective Marketing Tool. CRC Press, ??? (1995) Shukla et al. [2022] Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Stewart, B.: Packaging as an Effective Marketing Tool. CRC Press, ??? (1995) Shukla et al. [2022] Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Méndez, J., Oubiña, J., Rubio, N.: The relative importance of brand-packaging, price and taste in affecting brand preferences. British Food Journal 113, 1229–1251 (2011) https://doi.org/10.1108/00070701111177665 Stewart [1995] Stewart, B.: Packaging as an Effective Marketing Tool. CRC Press, ??? (1995) Shukla et al. [2022] Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Stewart, B.: Packaging as an Effective Marketing Tool. CRC Press, ??? (1995) Shukla et al. [2022] Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Stewart, B.: Packaging as an Effective Marketing Tool. CRC Press, ??? (1995) Shukla et al. [2022] Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Shukla, P., Singh, J., Wang, W.: The influence of creative packaging design on customer motivation to process and purchase decisions. Journal of Business Research 147, 338–347 (2022) https://doi.org/10.1016/j.jbusres.2022.04.026 Riaz and Ghafoor [2019] Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Riaz, T., Ghafoor, M.: Strategic logo placement on packaging - using conceptual metaphors of power in packaging – evidence from pakistan. Procedia Computer Science 158, 582–589 (2019) https://doi.org/10.1016/j.procs.2019.09.092 Dong and Gleim [2018] Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Dong, R., Gleim, M.: High or low: The impact of brand logo location on consumers product perceptions. Food Quality and Preference 69 (2018) https://doi.org/10.1016/j.foodqual.2018.05.003 Rebollar et al. [2015] Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Rebollar, R., Lidón, I., Martin Vallejo, F., Puebla, M.: The identification of viewing patterns of chocolate snack packages using eye-tracking techniques. Food Quality and Preference 39, 251–258 (2015) https://doi.org/10.1016/j.foodqual.2014.08.002 Piqueras-Fiszman et al. [2013] Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Piqueras-Fiszman, B., Velasco, C., Salgado-Montejo, A., Spence, C.: Using combined eye tracking and word association in order to assess novel packaging solutions: A case study involving jam jars. Food Quality and Preference 28(1), 328–338 (2013) https://doi.org/10.1016/j.foodqual.2012.10.006 Raheem et al. [2014] Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Raheem, A.R., Vishnu, P., Ahmed, A.M.: Impact of product packaging on consumer’s buying behavior. European journal of scientific research 122(2), 125–134 (2014) Girard et al. [2013] Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Girard, T., Anitsal, M.M., Anitsal, I.: The role of logos in building brand awareness and performance: Implications for entrepreneurs. The Entrepreneurial Executive 18, 7 (2013) Krishna et al. [2017] Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Krishna, A., Cian, L., Aydınoğlu, N.: Sensory aspects of package design. Journal of Retailing 93, 43–54 (2017) https://doi.org/10.1016/j.jretai.2016.12.002 Otterbring et al. [2013] Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Otterbring, T., Shams, P., Wästlund, E., Gustafsson, A.: Left isn’t always right: Placement of pictorial and textual package elements. British Food Journal 115 (2013) https://doi.org/10.1108/BFJ-08-2011-0208 Hou et al. [2023] Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Hou, S., Li, J., Min, W., Hou, Q., Zhao, Y., Zheng, Y., Jiang, S.: Deep learning for logo detection: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 20(3), 1–23 (2023) Borji and Itti [2012] Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence 35 (2012) https://doi.org/10.1109/TPAMI.2012.89 Hubert et al. [2008] Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Hubert, M., Baecke, S., Kenning, P.: What they see is what they get? an fmri‐study on neural correlates of attractive packaging. Journal of Consumer Behaviour 7, 342–359 (2008) https://doi.org/10.1002/cb.256 Alvino et al. [2021] Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Alvino, L., Constantinides, E., Lubbe, R.H.: Consumer neuroscience: Attentional preferences for wine labeling reflected in the posterior contralateral negativity. Frontiers in psychology 12, 688713 (2021) Maynard et al. [2018] Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Maynard, O., McClernon, F., Oliver, J., Munafò, M.: Using neuroscience to inform tobacco control policy. Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco 21 (2018) https://doi.org/10.1093/ntr/nty057 Gofman et al. [2009] Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Gofman, A., Moskowitz, H., Fyrbjork, J., Moskowitz, D., Mets, T.: Extending rule developing experimentation to perception of food packages with eye tracking. The Open Food Science Journal 3, 66–78 (2009) https://doi.org/10.2174/1874256400903010066 Pertzov et al. [2009] Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Pertzov, Y., Avidan, G., Zohary, E.: Accumulation of visual information across multiple fixations. Journal of Vision 9(10), 2–2 (2009) https://doi.org/10.1167/9.10.2 Nagel et al. [2011] Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Nagel, R., Reutskaja, E., Camerer, C., Rangel, A.: Search dynamics in consumer choice under time pressure: An eye-tracking study. American Economic Review 101, 900–926 (2011) https://doi.org/10.1257/aer.101.2.900 Ares and Deliza [2010] Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Ares, G., Deliza, R.: Studying the influence of package shape and colour on consumer expectations of milk desserts using word association and conjoint analysis. Food Quality and Preference 21, 930–937 (2010) https://doi.org/10.1016/j.foodqual.2010.03.006 Rettie and Brewer [2000] Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Rettie, R., Brewer, C.: The verbal and visual components of package design. Journal of Product & Brand Management 9 (2000) https://doi.org/10.1108/10610420010316339 Boia et al. [2015] Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Boia, R., Florea, C., Florea, L.: Elliptical asift agglomeration in class prototype for logo detection. In: Proceedings of the British Machine Vision Conference, pp. 115–111512 (2015) Sahbi et al. [2013] Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Sahbi, H., Ballan, L., Serra, G., Bimbo, A.: Context-dependent logo matching and recognition. IEEE Transactions on Image Processing 22(3), 1018–1031 (2013) Revaud et al. [2012] Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Revaud, J., Douze, M., Schmid, C.: Correlation-based burstiness for logo retrieval. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 965–968 (2012) Girshick et al. [2014] Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick [2015] Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Girshick, R.B.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Ren et al. [2015] Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2015) Hoi et al. [2015] Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Hoi, S.C.H., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q.: Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462 (2015) Oliveira et al. [2016] Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Oliveira, G., Frazao, X., Pimentel, A., Ribeiro, B.: Automatic graphic logo detection via fast region-based convolutional networks. In: International Joint Conference on Neural Networks, pp. 985–991 (2016) Li et al. [2017] Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Li, Y., Shi, Q., Deng, J., Su, F.: Graphic logo detection with deep region-based convolutional networks. In: IEEE Visual Communications and Image Processing, pp. 1–4 (2017) Lin et al. [2017] Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Lin, T.Y., Dollar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Meng et al. [2021] Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Meng, Y., Hou, S., Wang, J., Jia, W., Zheng, Y., Karim, A.: An adaptive representation algorithm for multi-scale logo detection. Displays 70, 102090 (2021) Jin et al. [2020] Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Jin, X., Su, W., Zhang, R., He, Y., Xue, H.: The open brands dataset: Unified brand detection and recognition at scale. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4387–4391 (2020) Carion et al. [2020] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020) Velazquez et al. [2021] Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Velazquez, D.A., Gonfaus, J.M., Rodríguez, P., Roca, F.X., Ozawa, S., Gonzalez, J.: Logo detection with no priors. IEEE Access 9, 106–998107011 (2021) Hou et al. [2021] Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Hou, Q., Min, W., Wang, J., Hou, S., Zheng, Y., Jiang, S.: Foodlogodet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4670–4679 (2021) Redmon et al. [2016] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon and Farhadi [2017] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) Bochkovskiy et al. [2020] Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Bochkovskiy, A., Wang, C.Y., Liao, H.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Wang et al. [2023] Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) [43] Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics Paleček and Chaloupka [2021] Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Paleček, K., Chaloupka, J.: Logo detection and identification in system for audio-visual broadcast transcription. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 357–360 (2021) Kroner et al. [2020] Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129, 261–270 (2020) Jia and Bruce [2020] Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Jia, S., Bruce, N.D.: Eml-net: An expandable multi-layer network for saliency prediction. Image and vision computing 95, 103887 (2020) Aydemir et al. [2023] Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Aydemir, B., Hoffstetter, L., Zhang, T., Salzmann, M., Süsstrunk, S.: Tempsal-uncovering temporal information for deep saliency prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6461–6470 (2023) Kümmerer et al. [2016] Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Kümmerer, M., Wallis, T.S., Bethge, M.: Deepgaze ii: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016) Linardos et al. [2021] Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Linardos, A., Kümmerer, M., Press, O., Bethge, M.: Deepgaze iie: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12919–12928 (2021) Droste et al. [2020] Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 419–435 (2020). Springer Cao et al. [2020] Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Cao, G., Tang, Q., Jo, K.-h.: Aggregated deep saliency prediction by self-attention network. In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings, Part III 16, pp. 87–97 (2020). Springer Lou and et al. [2022] Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Lou, J., al.: Transalnet: Towards perceptually relevant visual saliency prediction. Neurocomputing 494, 455–467 (2022) Lévêque and Liu [2019] Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Lévêque, L., Liu, H.: An eye-tracking database of video advertising. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 425–429 (2019). https://doi.org/10.1109/ICIP.2019.8802989 Liang et al. [2021] Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Liang, S., Liu, R., Qian, J.: Fixation prediction for advertising images: Dataset and benchmark. Journal of Visual Communication and Image Representation 81, 103356 (2021) Kou et al. [2023] Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Kou, Q., Liu, R., Lv, C., Jiang, H., Cheng, D.: Advertising image saliency prediction method based on score level fusion. IEEE Access 11, 8455–8466 (2023) https://doi.org/10.1109/ACCESS.2023.3236807 Jiang et al. [2022] Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Jiang, L., Li, Y., Li, S., Xu, M., Lei, S., Guo, Y., Huang, B.: Does text attract attention on e-commerce images: A novel saliency prediction dataset and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2088–2097 (2022) Liao and et al. [2022] Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Liao, M., al.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 919–931 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) Dosovitskiy et al. [2021] Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR) (2021) Shen and et al. [2021] Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Shen, Z., al.: Efficient attention: Attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021) Che et al. [2020] Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Che, Z., Borji, A., Zhai, G., Min, X., Guo, G., Callet, P.L.: Why is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29, 2287–2300 (2020) Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) Wang et al. [2022] Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Wang, J., Min, W., Hou, S., Ma, S., Zheng, Y., Jiang, S.: Logodet3k: A large-scale image dataset for logo detection. ACM Transactions on Multimedia Computing, Communications, and Applications 18(1), 1–19 (2022) Jiang et al. [2015] Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: Saliency in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015) Borji and Itti [2015] Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015) Judd et al. [2009] Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113 (2009). IEEE Judd et al. [2012] Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012) Lautenbacher [2012] Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Lautenbacher, O.P.: From still pictures to moving pictures. Eye-tracking in Audiovisual Translation, 135–155 (2012) Cerf et al. [2007] Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems 20 (2007) Yu et al. [2010] Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Yu, D., Park, H., Gerold, D., Legge, G.E.: Comparing reading speed for horizontal and vertical english text. Journal of vision 10(2), 21–21 (2010) Singh [2006] Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006) Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
- Singh, S.: Impact of color on marketing. Management decision 44(6), 783–789 (2006)
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.